Indexing Weblogs One Post at a Time

نویسنده

  • Natalie S. Glance
چکیده

In order to perform analysis over weblogs, we must first identify the appropriate unit of a weblog that corresponds to a document. We argue in the paper that, for weblogs, the correct unit is the weblog post. A weblog post is a structured document with the following fields: date, timestamp, title, content, permalink and author. We present our approach for segmenting weblogs into posts, which breaks down into several steps: (1) automatic feed discovery; (2) feed-guided segmentation, using the weblog feed and HTML; and (3) modelbased weblog segementation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BlogPulse: Automated Trend Discovery for Weblogs

Over the past few years, weblogs have emerged as a new communication and publication medium on the Internet. In this paper, we describe the application of data mining, information extraction and NLP algorithms for discovering trends across our subset of approximately 100,000 weblogs. We publish daily lists of key persons, key phrases, and key paragraphs to a public web site, BlogPulse.com. In a...

متن کامل

Adaptive Weblog Post Filtering Based on User Browsing History

One of the most important Web-based services that established the foundations of the Web 2.0 is the weblog. Weblogs are evolving to be topic based systems that can lead to more revenue for companies. Therefore many companies provide free weblog hosting. Weblog popularity is an effective factor to gain more revenue. Weblogs have posts and topics that are arranged chronologically with the most re...

متن کامل

Mapping the Blogosphere in America

This short paper constitutes the first phase of a long-term project focused on probing American urban culture by examining the hyperlinks and text of personal weblogs. It discusses methods of extracting geographic location information from weblogs and ways of indexing weblogs to city units. After a brief introduction to the broader research plan, the paper proposes a process to automatically ex...

متن کامل

E-Tools to Assist EFL Learners' Writing Skill: Wikis, Weblogs, and Podcasts

One of the promises of web-based education is to help students take control of their learning pace as the basic requirement of language learning is being life-long. The purpose of the present study was to find out which of the e-tools -- weblogs, wikis, or podcasts -- can better help EFL learners excel in their writing skill. To this end, 156 Iranian sophomore students majoring in English and s...

متن کامل

Blogs Search Engine Using RSS Syndication and Fuzzy Parameters

The rapid development of the internet eventually increases the number of internet users triggering the need for an intelligent search engine that is able to minimize the search on world wide web (WWW) and find relevant information as requested. To overcome the issue of finding relevant information as well as minimizing the search on WWW, this paper proposes a search engine that is specifically ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006